Article 4118

Title of the article

AN APPROACH TO IMPROVING ALGORITHMS FOR COMPUTING DISTANCES BETWEEN DNA CHAINS
(BY THE EXAMPLE OF THE NEEDLEMAN – WUNSCH ALGORITHM) 

Authors

Mel'nikov Boris Feliksovich, Doctor of physical and mathematical sciences, professor, sub-department of information systems and networks, Russian State Social University (4 Wilgelma Pika street, Moscow, Russia), bf-melnikov@yandex.ru
Trenina Marina Anatol'evna, Senior lecturer, sub-department of applied mathematics and informatics, Institute of mathematics, physics and information technologies, Togliatti State University (14 Belorusskaya street, Togliatti, Russia), trenina.m.a@yandex.ru
Kochergin Aleksandr Sergeevich, Postgraduate student, Russian State Social University (4 Wilgelma Pika street, Moscow, Russia), kocherginalexandr@gmail.com

Index UDK

004.021; 004.023; 51-76

DOI

10.21685/2072-3040-2018-1-4

Abstract

Background. In practice, it is often necessary to calculate the distance between sequences of a different nature. Similar algorithms are used in bioinformatics to compare sequenced genetic chains. Because of the large dimensionality of such chains, we have to use heuristic algorithms that give approximate results. Therefore, the problem arises of estimating the quality of the metrics (distances) used, which, according to its results, one can conclude that the algorithm is applicable to various studies. Purpose of the study – improving the quality of the distance between thee long strings.
Materials and methods. To compare the genetic chains taken from the open bank NCBI, we offer a heuristic algorithm developed on the basis of the Needleman – Wunsch algorithm. After implementing this algorithm, a special function with three parameters to the obtained metric values is applied, which are determined by the method of gradient descent.
Results. A qualitative evaluation of the operation of algorithms for calculating the distance between DNA strings was obtained. An approach to the improvement of such algorithms was developed.
Conclusions. It was proposed to improve the Needleman – Wunsch algorithm for  comparison of string sequences, and also formulate an approach to improving other algorithms for constructing metrics on long lines.

Key words

measure of the similarity of DNA sequences, heuristic algorithms, the Needleman – Wunsch algorithm

 Download PDF
References

1. Ayala F., Kayger Dzh. Sovremennaya genetika: per. s angl.: v 3 t. [Modern genetics: translation from English: in 3 volumes]. Moscow: Mir, 1987, vol. 1, 295 p.
2. Hamming R. W. The Bell System Technical Journal. 1950, vol. 29 (2), pp. 147–160.
3. Levenshteyn V. I. Doklady Akademii nauk SSSR [Reports of the USSR Academy of Sciences]. 1965, vol. 163.4, pp. 845–848.
4. Kormen T. Leyzerson Ch., Rivest R., Shtayn K. Algoritmy. Postroenie i analiz [Algorithms. Building and analysis]. Moscow: Vil'yams, 2005, 1296 p.
5. Needleman S., Wunsch C. Journal of Molecular Biology. 1970, vol. 48 (3), pp. 443–453.
6. Smith T. F., Waterman M. S. Journal of Molecular Biology. 1981, vol. 147, pp. 195–197.
7. Hromkovic J. Algorithmics for Hard Problems. Introduction to Combinatorial Optimization, Randomization, Approximation, and Heuristics. Germany: Springer, 2003, 538 p.
8. Mel'nikov B., Romanov N. Teoreticheskie problemy informatiki i ee prilozheniy [Theoretical problems of informatics and its applications]. 2001, vol. 4, pp. 81–87.
9. Melnikov B., Radionov A., Moseev A., Melnikova E. ICSOFT, Technologies, Proceedings 1st International Conference on Software and Data Technologies. Germany: Springer, 2007, pp. 272–279.
10. Mel'nikov B. F., Panin A. G. Vektor nauki Tol'yattinskogo gosudarstvennogo universiteta [The scientific vector of Togliatti State University]. 2012, no. 4 (22), pp. 83–86.
11. Akho A., Ul'man Dzh. Teoriya sintaksicheskogo analiza, perevoda i kompilyatsii. Sintaksicheskiy analiz [The theory of syntactical analysis, translation and compilation]. Moscow: Kniga po Trebovaniyu, 2012, vol. 1, 613 p.
12. Home – Nucleotide – NCBI. Available at: https://www.ncbi.nlm.nih.gov/nuccore
13. Pages H., Aboyoun P., Gentleman R., DebRaoy S. Biostrings: String Objects Representing Biological Sequences and Matching Algorithms. Bioconductor. Available at: https://bioc.ism.ac.jp/packages/2.6/bioc/html/Biostrings.html
14. Pairwise_Alignment_Mammals. Available at: https://yadi.sk/d/Oa-DTAC83SCzRq
15. Frans B. M. Bonobo: The Forgotten Ape. University of California Press, 1998, 224 p.
16. Melnikov B. F., Pivneva S. V., Trifonov M. A. Informatsionnye tekhnologii i nanotekhnologii (ITNT-2017): sb. tr. III Mezhdunar. konf. i molodezhnoy shkoly [Information technologies and nanotechnologies (ITNT-2017): proceedings of III International conference and youth school]. Samarskiy natsional'nyy issledovatel'skiy universitet imeni akademika S. P. Koroleva. Samara, 2017, pp. 1640–1645.
17. Kolmogorov A. N., Fomin S. V. Elementy teorii funktsiy i funktsional'nogo analiza [Elements of the theory of fuctions and functional analysis]. MGU im. M. V. Lomonosova. Ed. 7th. Moscow: Fizmatlit, 2004, 570 p.

 

Дата создания: 13.06.2018 13:34
Дата обновления: 28.08.2018 13:43